首页> 外文OA文献 >Fast learning optimized prediction methodology for protein secondary structure prediction, relative solvent accessibility prediction and phosphorylation prediction
【2h】

Fast learning optimized prediction methodology for protein secondary structure prediction, relative solvent accessibility prediction and phosphorylation prediction

机译:用于蛋白质二级结构预测,相对溶剂可及性预测和磷酸化预测的快速学习优化预测方法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Computational methods are rapidly gaining importance in the field of structural biology, mostly due to the explosive progress in genome sequencing projects and the large disparity between the number of sequences and the number of structures. There has been an exponential growth in the number of available protein sequences and a slower growth in the number of structures. There is therefore an urgent need to develop computed structures and identify the functions of these sequences. Developing methods that will satisfy these needs both efficiently and accurately is of paramount importance for advances in many biomedical fields, for a better basic understanding of aberrant states of stress and disease, including drug discovery and discovery of biomarkers.Several aspects of secondary structure predictions and other protein structure-related predictions are investigated using different types of information such as data obtained from knowledge-based potentials derived from amino acids in protein sequences, physicochemical properties of amino acids and propensities of amino acids to appear at the ends of secondary structures. Investigating the performance of these secondary structure predictions by type of amino acid highlights some interesting aspects relating to the influences of the individual amino acid types on formation of secondary structures and points toward ways to make further gains. Other research areas include Relative Solvent Accessibility (RSA) predictions and predictions of phosphorylation sites, which is one of the Post-Translational Modification (PTM) sites in proteins.Protein secondary structures and other features of proteins are predicted efficiently, reliably, less expensively and more accurately. A novel method called Fast Learning Optimized PREDiction (FLOPRED) Methodology is proposed for predicting protein secondary structures and other features, using knowledge-based potentials, a Neural Network based Extreme Learning Machine (ELM) and advanced Particle Swarm Optimization (PSO) techniques that yield better and faster convergence to produce more accurate results. These techniques yield superior classification of secondary structures, with a training accuracy of 93.33% and a testing accuracy of 92.24% with a standard deviation of 0.48% obtained for a small group of 84 proteins. We have a Matthew\u27s correlation-coefficient ranging between 80.58% and 84.30% for these secondary structures. Accuracies for individual amino acids range between 83% and 92% with an average standard deviation between 0.3% and 2.9% for the 20 amino acids. On a larger set of 415 proteins, we obtain a testing accuracy of 86.5% with a standard deviation of 1.38%. These results are significantly higher than those found in the literature.Prediction of protein secondary structure based on amino acid sequence is a common technique used to predict its 3-D structure. Additional information such as the biophysical properties of the amino acids can help improve the results of secondary structure prediction. A database of protein physicochemical properties is used as features to encode protein sequences and this data is used for secondary structure prediction using FLOPRED. Preliminary studies using a Genetic Algorithm (GA) for feature selection, Principal Component Analysis (PCA) for feature reduction and FLOPRED for classification give promising results.Some amino acids appear more often at the ends of secondary structures than others. A preliminary study has indicated that secondary structure accuracy can be improved as much as 6% by including these effects for those residues present at the ends of alpha-helix, beta-strand and coil.A study on RSA prediction using ELM shows large gains in processing speed compared to using support vector machines for classification. This indicates that ELM yields a distinct advantage in terms of processing speed and performance for RSA. Additional gains in accuracies are possible when the more advanced FLOPRED algorithm and PSO optimization are implemented.Phosphorylation is a post-translational modification on proteins often controls and regulates their activities. It is an important mechanism for regulation. Phosphorylated sites are known to be present often in intrinsically disordered regions of proteins lacking unique tertiary structures, and thus less information is available about the structures of phosphorylated sites. It is important to be able to computationally predict phosphorylation sites in protein sequences obtained from mass-scale sequencing of genomes. Phosphorylation sites may aid in the determination of the functions of a protein and to better understanding the mechanisms of protein functions in healthy and diseased states. FLOPRED is used to model and predict experimentally determined phosphorylation sites in protein sequences. Our new PSO optimization included in FLOPRED enable the prediction of phosphorylation sites with higher accuracy and with better generalization. Our preliminary studies on 984 sequences demonstrate that this model can predict phosphorylation sites with a training accuracy of 92.53% , a testing accuracy 91.42% and Matthew\u27s correlation coefficient of 83.9%.In summary, secondary structure prediction, Relative Solvent Accessibility and phosphorylation site prediction have been carried out on multiple sets of data, encoded with a variety of information drawn from proteins and the physicochemical properties of their constituent amino acids. Improved and efficient algorithms called S-ELM and FLOPRED, which are based on Neural Networks and Particle Swarm Optimization are used for classifying and predicting protein sequences. Analysis of the results of these studies provide new and interesting insights into the influence of amino acids on secondary structure prediction. S-ELM and FLOPRED have also proven to be robust and efficient for predicting relative solvent accessibility of proteins and phosphorylation sites. These studies show that our method is robust and resilient and can be applied for a variety of purposes. It can be expected to yield higher classification accuracy and better generalization performance compared to previous methods.
机译:计算方法在结构生物学领域中的重要性正在迅速提高,这主要归因于基因组测序项目的爆炸性发展以及序列数与结构数之间的巨大差异。可用蛋白质序列的数量呈指数增长,而结构数量的增长较慢。因此,迫切需要开发计算的结构并识别这些序列的功能。开发有效且准确地满足这些需求的方法对于许多生物医学领域的进步,更好地基本了解压力和疾病的异常状态(包括药物发现和生物标志物的发现)至关重要。二级结构预测和使用不同类型的信息来研究其他蛋白质结构相关的预测,例如从蛋白质序列中氨基酸衍生的基于知识的电势获得的数据,氨基酸的物理化学性质以及氨基酸在二级结构末端出现的倾向。通过氨基酸类型研究这些二级结构预测的性能突出了一些有趣的方面,这些方面涉及单个氨基酸类型对二级结构形成的影响,并指出了进一步获得收益的方法。其他研究领域包括相对溶​​剂可及性(RSA)预测和磷酸化位点的预测,这是蛋白质中的翻译后修饰(PTM)位点之一。蛋白质的二级结构和蛋白质的其他特征可以有效,可靠,便宜地预测并且更精确地。提出了一种称为快速学习优化预测(FLOPRED)方法的新方法,该方法可利用基于知识的潜力,基于神经网络的极限学习机(ELM)和先进的粒子群优化(PSO)技术来预测蛋白质的二级结构和其他特征。更好更快地收敛以产生更准确的结果。这些技术可对二级结构进行更好的分类,对于一小组84种蛋白质,其训练精度为93.33%,测试精度为92.24%,标准偏差为0.48%。对于这些二级结构,我们的Matthew相关系数在80.58%到84.30%之间。单个氨基酸的准确度在83%至92%之间,而20个氨基酸的平均标准偏差在0.3%至2.9%之间。在更大的415种蛋白质上,我们获得了86.5%的测试准确度,标准偏差为1.38%。这些结果明显高于文献中的结果。基于氨基酸序列的蛋白质二级结构预测是预测其3-D结构的常用技术。其他信息,例如氨基酸的生物物理特性,可以帮助改善二级结构预测的结果。蛋白质物理化学性质的数据库用作编码蛋白质序列的特征,并且该数据用于使用FLOPRED进行二级结构预测。使用遗传算法(GA)进行特征选择,主成分分析(PCA)进行特征约简和FLOPRED进行分类的初步研究得出了可喜的结果。某些氨基酸比其他氨基酸更经常出现在二级结构的末端。初步研究表明,通过将这些影响包括在α-螺旋,β-链和螺旋末端的残基,可以将二级结构的准确性提高多达6%。使用ELM进行RSA预测的研究表明,与使用支持向量机进行分类相比,处理速度更快。这表明ELM在RSA的处理速度和性能方面具有明显的优势。当实施更高级的FLOPRED算法和PSO优化时,可能会获得更多的准确性。磷酸化是蛋白质上的翻译后修饰,通常可以控制和调节其活性。这是重要的监管机制。已知磷酸化位点经常存在于缺乏独特三级结构的蛋白质的内在无序区域,因此关于磷酸化位点结构的信息较少。重要的是要能够计算出从基因组大规模测序获得的蛋白质序列中的磷酸化位点。磷酸化位点可以帮助确定蛋白质的功能,并更好地了解健康和患病状态下蛋白质功能的机制。 FLOPRED用于建模和预测蛋白序列中实验确定的磷酸化位点。 FLOPRED中包含的我们新的PSO优化使预测磷酸化位点的准确性更高,泛化效果更好。我们对984个序列的初步研究表明,该模型可以预测磷酸化位点,训练准确度为92.53%,测试准确度为91.42%,Matthews相关系数为83.9%。总之,二级结构预测,相对溶剂可及性和磷酸化位点对多组数据进行了预测,这些数据编码了从蛋白质中提取的各种信息及其组成氨基酸的物理化学性质。基于神经网络和粒子群优化的改进高效算法S-ELM和FLOPRED用于分类和预测蛋白质序列。这些研究结果的分析为氨基酸对二级结构预测的影响提供了新的有趣的见解。 S-ELM和FLOPRED还被证明对于预测蛋白质和磷酸化位点的相对溶剂可及性是强大而有效的。这些研究表明,我们的方法是鲁棒的和有弹性的,可以应用于多种目的。与以前的方法相比,可以期望产生更高的分类精度和更好的泛化性能。

著录项

  • 作者

    Sundararajan, Saraswathi;

  • 作者单位
  • 年度 2011
  • 总页数
  • 原文格式 PDF
  • 正文语种 en
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号